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FOREWORD 



This report, submitted in compliance with Article 3 of the contract, 
reports on technical activities of Project ABLE during its fifth quarter 
of operation, 1 April through 30 June 1966. A brief overview of the 
project is presented first, followed by a report summary. The major por- 
tion of the report is a discussion of the development of performance 
measures to be used to assess students' achievement of the objectives of 
instruction. Project plans for next quarter are outlined. 
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OVERVIEW: Project ABLE 



A Joint Resea r di Project of : Public Schools of Quincy, Massachusetts 

and American Institutes for Research 



Title: DEVELOPMENT AND EVALUATION OF AN EXPERIMENTAL CURRICULUM FOR 

THE NEW QUINCY (MASS.) VOCATIONAL- TECHNICAL SCHOOL 



Obiectives: The principal goal of the project is to demonstrate increased 

effectiveness of instruction whose content is explicitly derived from 
analysis of desired behavior after graduation, and which, in addition, 
attempts to apply newly developed educational technology to the design, 
conduct, and evaluation of vocational education. Included in this new 
technology are methods of defining educational objectives, deriving 
topical content for courses, preparation of students in prerequisite 
knowledges and attitudes, individualizing instruction, measuring student 
achievement, and establishing a system for evaluating program results 
in terms of outcomes following graduation. 

Procedure ; The procedure begins with the collection of vocational infor- 
mation for representative jobs in eleven different vocational areas. 

Analysis will then be made of the performances required for job execution, 
resulting in descriptions of essential classes of performance which need 
to be learned. On the basis of this information, a panel of educational 
and vocational scholars will develop recommended objectives lOr a vocational 
curriculum which incorporates the goals of (a) vocational competence; 

(b) responsible citizenship; and (c) individual self-fulfillment. A 
curriculum then will be designed in topic form to provide for comprehen- 
siveness, and also for flexibility of coverage, for each of the vocational 
areas. Guidance programs and prerequisite instruction to prepare junior 
high students also will be designed. Selection of instructional materials, 
methods, and aids, and design of materials, when required, will also be 
undertaken. An important step will be the development of performance 
measures tied to the objectives of instruction. Methods of instruction 
will be devised to make possible individualized student progression ^ and 
selection of alternative programs, and teacher- training materials will 
be developed to accompl ish inservice teacher education of Quincy School 
personnel. A plan will be developed for conducting program evaluation 
not only in terms of end-of-year examinations, but also in terms of con- 
tinuing follow-up of outcomes after graduation. 

Time Schedule : Begin 1 April 1965 

Complete 31 March 1970 
Present Contract to 30 June 1967 
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REPORT SUMMARY 



During the present reporting period, technical activity was directed 
primarily. to (1) continued development of junior high guidance program 
materials and completion of arrangements for p .-ogram implementation, (2) 
completion of course and topic objectives in some curriculum areas, and 
(3) the beginning of development of measures for verifying students* 
achievement of instructional objectives. The present report is concerned 
with achievement measures. It reviews the curriculum structure and instruc- 
tional method') which have been planned and identifies a number of important 
roles for which achievement measures are needed. The technical requirements 
for measures employed in those roles are examined and the procedures for 
developing such measures are discussed. 

During the next quarter, test development will occupy a greater pro- 
portion of total activity. Selection and development of instructional 
materials, aids, and procedures will continue concurrent with the develop- 
ment of measures. Junior high guidance preparations will be completed and 
the program will be initiated. 



THE ROLES, CHARACTERISTICS, AND DEVELOPMENT PROCEDURES 
FOR MEASURES OF INDIVIDUAL ACHIEVEMENT 

Perhaps the most distinctive characteristic of Project ABLE to date 
is its persistent focus on the performance capabilities of students. This 
emphasis was established at the outset by the statement of the project’s 
purpose which was, in part, to evaluate the effectiveness of a curriculum 
derived explicitly from the behavior desired of graduates. It was taken 
as fundamental that education aims primarily to produce learning by stu- 
dents; that learning involves changes in the capabilities of students, that 
Is, that a student has learned when he can demonstrate a capability which 
he could not demonstrate before the learning experience; and that the basic 
design task of the project was to select the demonstrable capabilities 
desired of students and to establish conditions under which those capabil- 
ities could be acquired efficiently. 

Adherence to this primary purpose, and to the rather simple assumptions 
associated with it, has led us over new routes to results quite different 
from the usual products of curriculum development. Previous reports 
(American Institutes for Research, 1965a, 1965b, 1965c, 1966) describing 
the development procedures, instructional objectives, curriculum outlines, 
and guidance programs of the project reveal the differences in curriculum 
design. That work can not be recounted here, but it should be noted that 
the statement of instructional objectives in behavioral terms was the key 
to most differences between Project ABLE products and those more commonly 
obtained. Vague and uncertain statements about what the student should 
learn were avoided in favor of clear statements about what he should be 
able to do. Objectives were identified as the content of the curriculum, 

and content was distinguished thereby from the conditions under which 
learning would take place (e.g., teachers' activities, instructional methods 

materials, aids, procedures). 

The design of the curriculum, then, has proceeded in accordance with 
the original purpose. But the implementation of the curriculum and the 



evaluation of its effectiveness require means for assessing the performance 
capabilities acquired by the students* The remainder of this report is 
devoted to consideration of the problems of performance measurement. Fol- 
lowing a brief description of the curriculum structure and the educational 
methods which are relevant to the problems of measurement, the discussion 
is organized around three major topics. First to be considered are the 
roles assigned to performance measures. This review of functional require- 
ments leads to an examination of the principal technical characteristics 
which the measures must have to play their intended roles. Finally, spe- 
cific procedures are summarized for developing operational measures. 

Curriculum Structure and Instructional Methods 

Curriculum is being developed in 16 areas: 11 vocational areas, 

4 "academic" areas, and a new area called basic technology. In each, the 
content will consist of a set of objectives stated in terms of the capabil- 
ities to be demonstrated by successful students as a result of prescribed 
learning experiences. The objectives are being organized hierarchically. 
That is, each area has a set of "course" objectives at the top pf the 
hierarchy. These are the end capabilities toward which all earlier learning 
is to be directed. Each course objective has subordinate "topic" objectives 
which are statements of prerequisite capabilities. Objectives subordinate 
to topic objectives are provided when required. The learning sequence will 
extend at one end to the lowest capability level expected of entering stu- 
dents, and at the other end to the highest capabiliiy level for which 
training is to be provided. The curriculum structure conforms in general 
with the hierarchical concepts described by Gagne (1965). 

The sequence of learning objectives is being defined in accordance with 
two major considerations. The first consideration has been suggested above 
in reference to prerequisite capabi 1 i ties. That is, some capabilities 
occur later in a sequence because the student cannot acquire them unless he 
is able already to do the things specified in earlier objectives. For 
example, the student cannot learn to solve systems of linear equations until 
he has learned to perform simple algebraic manipulations. In the vocational 
areas, a second reason for the sequences is that the learning objectives 
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are intended to parallel the structure of jobs selected for training. Thus, 
in each vocational area, a sequence of jobs was chosen such that a large 
proportion of the skills and knowledges of any job also are required for 
successful performance of jobs later in the sequence. In this arrangement, 
a student qualifies for successively higher*1evel jobs as he progresses 
through the learning sequence. Within broad limits, each student thus would 
have marketable skill whenever he leaves school. 

The curriculum structure is intended to be at least compatible with 
individualized instructional methods. It is planned that each student will 
proceed through an individually prescribed learning sequence, advancing to 
the next objective as he demonstrates that he has acquired the prerequisite 
capabilities. It is planned also that lectures by the teacher will be 
minimized in favor of individual study, small group discussions, demonstra- 
tions, and tutorial work. 

With this brief review of curriculum structure and instructional 
methods as background, we turn now to consideration of the roles of perfor- 
mance measurement. 

Roles for Performance Measurement 

This section is concerned with the several roles which it is expected 
that performance measures will play in the conduct and evaluation of the 
experimental curricula. These roles are the "why" of performance measure- 
ment, and the descriptions which follow indicate the uses to which we 
expect to put the results of measurement. 

Diagnosis . If students are to work their various ways through indi- 
vidually prescribed sequences of hierarchical objectives, it is important 
that the first event in each student's experience with a curriculum area 
be a determination of his present capabilities. What is required is a 
diagnostic report which locates the student's proper starting place in the 
curriculum by identifying the relevant capabilities he can demonstrate and 
those he has not yet learned. With this information available, the teacher 
can identify the learning assignments which the student should attempt in 
order to proceed efficiently toward his educational objectives. This 



3 



diagnostic test may reveal that the student lacks some essential capability 
normally acquired in Junior high school. In such a case, the appropriate 
assignment would provide the student with the opportunity to acquire that 
capability before attempting to meet objectives which depend on the skill 
or knowledge which is missing from his repertoire. In other cases, entering 
students may demonstrate some cape bili ties which are well in advance of the 
usual starting place in the curriculum. The appropriate assignment for 
these students would provide them with the opportunity to build on their 
past learning without having to go through material and exercises which 
would not add to their existing capabilities. The diagnostic measurement 
of performance capabilities is an important role because it provides the 
basis for individually prescribed sequences of learning assignments. 

Achievement demonstration . A ;,ecnnd role for performance measures is 
closely allied with the first. They are expected to function as the means 
whereby the student demonstrates achievement of each learning objective. 

In the instructional procedures being planned by the project, each learning 
assignment to a student would include a statement of the end performance to 
be demonstrated, the important conditions under which the demonstration is 
to take place, and the criteria by which the performance is to be Judged. 

The student would take the test on a learning unit when he believed himself 
able to pass, if he succeeded, he would progress to another learning task. 
If he did not pass, he would return to the same assignment, or to remedial 
or alternative assignments as necessary, until he could demonstrate that he 
had accomplished the assigned learning. The performance measures thus would 
function as the means by which students demonstrate at each step their 
readiness to progress in the curriculum. 

Occupational certification . Performance measures are expected to play 
an additional role in vocational areas of the curriculum. Thus, it is 
planned that as a student passes each test, he provides evidence then v 
that he has a capability required for competence in one or more occupations. 
When he has demonstrated all of the required capabilities, he is eligible 
for certification by the school as competent in an occupation. The voca- 
tional performance measures thus would be the means by which students 



earned official recognition of their marketable capabilities. A student 
might qualify for several certificates in the course of his secondary 
education. Normally, he would be awarded only the last (or highest level) 
one earned, though any earned certificate could be supplied on the basis 
of his record. 

Retention and general izat ion . Each of the roles described thus far 
is a measurement primarily of capability deliberately acquired, and is 
taken at the point of first mastery. That is, the student works on acqui- 
sition of a capability and then promptly demonstrates his mastery of it 
on a test designed for that purpose. It is reasonable, however, to measure 
two additional aspects of a student*s capabilities at selected points in his 
development. Such a point might be at the completion of a number of assign- 
ments which are coordinate objectives, all prerequisite to a major learning 
task. Since these prerequisite learnings would be accomplished over a 
period of some time, it might be important to verify the student's retention 
of these previously demonstrated capabilities before he went on. Any im- 
portant deficiency then could be remedied before the student attempted the 
major learning task for which his area of deficie. was a prerequisite. 

In addition to verifying the retention of previous learning, different 
performance measures could be introduced at selected points to assess the 
student's ability in areas not covered directly by his previous assignments. 
Such tests of the generalization of learning would be useful in deciding 
whether a student would profit more from assignments designed to broaden 
his capabilities in some portion of a course of study or more from proceed- 
ing to advanced levels of the sequence. These measures also would provide 
information about the extent to which the curriculum contributed to the 
achievement of educational outcomes other than those specified by the objec- 
tives. This matter is related to a later discussion of the role of perfor- 
mance measures in curriculum evaluation. 

Orientation and motivation . It is expected that an objective and its 
passing requirement stated in advance for the student in performance terns 
would provide him with an unusually clear goal which may be attained in a 
modest amount of time. Since the relation of each individual objective to 
the student's longer range goals would be demonstrable through a sequence 
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of learning objectives, the necessity and relevance of each achievement 
should be clear. Further, the outcome of each test of the student's learn- 
ing would be clear to the student and to the teacher. The combination of 
clearly-defined, relevant requirements, unambiguous evaluation, and frequent 
opportunity to achieve is expected to enhance the student's motivation for 

learning. 

Evaluation and sequencing of learning units . In the curriculum devel- 
opment process, every effort is being made, of course, to devise effective 
learning units and to arrange them in hierarchical sequence. Howeyer, it 
is easy to err in this process. Ineffective units can appear and units can 
be arranged in erroneous sequences when the development depends on rational 
analysis only. The performance data collected during tryout of the curric- 
ulum are expected- to provide an emy>irical basis for evaluation of the 
effectiveness and sequencing of the units. Such findings as unexpectedly 
long times to complete units, repeated failure to pass the unit tests, and 
large proportions of failed first attempts all would indicate defective 
units. The sequence in which units are arranged also can be evaluated from 
the results of unit performance tests (Gagne, 1966). The result expected 
from a proper sequence is that all, or nearly all, of the students passing 
a unit also would pass units presumed to be its prerequisites. Pass-fail 
data arranged in a student-by-unit matrix and data on proportions of stu- 
dents passing each unit provide evidence as to the tenability of the initial 
sequence, possible rearrangements, and the need for additional units. Per- 
formance measures therefore are expected to play an important role during 
the tryout period by facilitating the evaluation and revision of the experi- 
mental units and their sequential arrangement. 

Evaluation of curriculum effectiveness . Clearly, the performances of 
students on measures of their capability for tasks defined as learning 
objectives are basic data inputs to curriculum evaluation. They provide an 
answer to the fundamental question, "Did students learn what we intended 
that they learn?" But many other questions must be asked in evaluating the 
effectiveness of the curriculum, and some of these questions will require 
that other data be gathered on students* capabilities. A previous section 
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Identified the need for performance measures designed to assess the extent 
to which t he curriculum provides extra values through generalization of 
learning and through acquisition of "incidental" skills and knowledges. 

It is planned that such measures will be used and that they will contribute 
to the evaluation of the extent to which many important educational objec- 
tiveSj not stated as specific objectives for the curriculum, are met 
(Cronbach, 1964). 

Summary . Performance tests of the capabilities designated as learn- 
ing objectives for the student are expected to play important, roles in: 

• diagnosing the initial learning status of each student 
and prescribing individually appropriate sequences 

of assignments. 

• demonstrating that unit objectives have been met and 
that the student is ready to proceed to another unit. 

• certifying students in occupations. 

• verifying retention of previously demonstrated capabilities. 

• orienting students to and motivating them for learning. 

• evaluating individual learning units and their sequencing. 

• assessing curriculum effectiveness. 

Other performance measures are expected to be used in assessing the gener- 
alization of learning, the acquisition of "incidental" skills and knowledges, 
and the extent to which other important educational outcomes are achieved. 

> 

Characteristics of the Measures 

The characteristics needed in a measurement depend upon the uses for 
which the measure is intended and upon the operational conditions under which 
the measurement will be taken and used. The preceding sections have described 
both the uses and the operational conditions planned by Project ABLE. This 
section will consider the implications of those conditions anci uses for the 
kinds of measures we need. 
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Types of decisions facilitated . Cronbach (i960, 1964) has distinguished 
between tests in education according to the kinds of decisions they are 
expected to facilitate. Thus, tests are used to make selection and classi- 
fication decisions about individuals, to evaluate and revise curricula, to 
make decisions for administrative regulation, and to test scientific hypotheses, 
Glaser and Klaus (1962) present a similar analysis and also describe quality 
control and system evaluation functions, which can be considered mixtures of 
Cronbach's decision types. In Project ABLE, the largest number of decisions 
by far will concern individual students. Tests will be used in decisions as 
to whether a student is ready to enter a sequence of study, which of the 
available assignments he should attempt, whether he has met the objective of 
a particular assignment and should be given additional work, or whether he 
needs to repeat an assignment. The results from these tests also will be 
major input data during the tryout period for decisions about the curriculum, 
including the revision and sequencing of learning units. Later, they will 
assist in evaluating the effectiveness of the curriculum. 

Tests used for decisions about individuals differ from other tests in 
two major respects. First, decisions about curriculum or about administrative 
matters usually can be based upon means of test data from samples of students. 
Not all students must be tested and unsystematic errors made in measuring the 
capabilities of individual students need not affect the appropriateness of 
decisions, since these errors tend to offset each other in the averaging 
process. When decisions are to be made about individuals, however, errors 
in measuring the capabilities of those individuals must be minimized. 

Secondly, it is more important in individual decision situations that the 
assessment be recognizable by the student as a fair and adequate measure of 
capabilities relevant to his educational goals. These two characteristics, 
individual accuracy and recognizable adequacy, will be important considera- 
tions in the following pages. 

Since the majority of tests must be devised to facilitate decisions 
about individuals, and since the data from these tests will serve secondarily 
as a major part of the basis for decisions on curriculum revision and admin- 
istrative regulation, we will direct our attention in the remainder of this 
report to the characteristics needed for such tests. Insofar as additional 
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proficiency measures are required for curriculum evaluation purposes, they 
are best selected or devised and discussed as part of an Integrated evalua- 
tion program- which will be the subject of a later report In this series. It 
might be noted that It would be Inappropriate to develop the number and 
kinds of tests needed for Individual decision If only curriculum and admin- 
istrative decisions were required. However, since these more demanding 
tests must be developed by the project, no Inefficiency Is Incurred and 
their use for purposes other than Individual decisions Involves no technical 
hazard. 

General and procedural characteristics . The curriculum structure and 
instructional methods described earlier require that a very large number 
and variety of tests be devised. Achievement of any course objective Is 
expected to require prior acquisition of numerous and diverse capabilities. 
Even though an end-of-course objective might be met by demonstrating the 
ability to produce some kind of machined part, for example, the constituent 
capabilities which must be acquired first may well call upon a wide range 
of psychological processes, response patterns, and stimulus contexts. 
Appropriate tests of the student’s capabilities during the learning process 
must be equally diverse. Our tests must assess the capability for which 
training was devised, using whatever supporting materials and conditions 
are appropriate. We would expect to use paper-and-penci 1 , equipment, job 
samples, oral reports, simulation or whatever Is essential, being guided 
in our choice by the stimulus context, the psychological processes, and the 
response modes demanded by the performance objective. 

Not only will the tests exist In great diversity, but they will be 
administered and Interpreted by many different teachers. Further, results 
from the tests will be collected and analyzed by a separate research staff 
concerned with decisions other than the assessment of Individual students. 
These requirements make It clear that procedures for administering and 
scoring each test should be standardized and that the test results should 
depend only minimally on the observer. Through standardization and objec- 
tivity we may hope to succeed In orienting an^ motivating the students. In 
providing fair and unambiguous results acceptable to student and teacher, 
and In providing reports on learning achievement which are sufficiently 
dependable for our operational and research purposes. 
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Validity . The point has been made several times in this paper that it 
is important that tests of individual capability measure performances which 
are relevant to the students* educational objectives. This is the essential 
question of test validity. A test is valid to the extent that it does the 
job it was intended to do, in this case to report on each student's achieve- 
ment of the stated objectives. 

Long-term objectives for the student, stated at the outset of the project, 
included vocational competence, responsible citizenship, and self-fulfillment. 
These objectives were reasonable and useful as goals, but they were not 
satisfactory as working objectives for curriculum development or as criteria 
for achievement test validation. First, of course, though our '‘real*' inter- 
est might be in the student's performance after graduation, as indicated by 
these goals, we could not wait several years for students to demonstrate 
their accomplishments. Secondly, the goals as stated lacked specification 
in terms of the performance capabilities they are intended to imply. As 
Cureton (1951 » p. 641) points out, such goals are merely labels representing 
abstract concepts and summarizing the behavior of persons whose actions 
within some defined series are characteristically successful. Objectives 
were needed which were closer in time to student learning and which specified 
the actions or performances which define the concepts of vocational compe- 
tence, responsible citizenship, and self-fulfillment. It was apparent that 
our ultimate goals could not be used directly as criteria against which to 
validate our curriculum or our achievement tests. 

Using a procedure described elsewhere (American Institutes for Research, 
1965 b), specific objectives were derived from the general goals by logical 
procedures. These more proximate, intermediate objectives describe the 
capabilities to be acquired by students in units and in courses of learning. 
They constitute a definition of the capabilities which our analysis indicates 
are essential to achievement of the long-range goals and which are feasible 
objectives within the public school context. The statements of learning 
objectives include a description of the performance, the criteria for judging 
success, and the important conditions under which the performance is to take 
place. The objectives are intended to imply directly how achievement should 
be measured. Thus, the topic and course objectives are the criteria for 
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evaluation of test validity. The relevant achievement test for any objec- 
tive thus is performance of examples of the criterion task as described in 
the learning objective. The empirical relation between achievement of 
curriculum objectives and achievement of the long-range goals, that is, the 
validity of the achievement tests for predicting success in later life, must 
be dealt with in long-term, follow-up studies. It is not considered further 
in this report. 

Since the test performance is intended to be a representative perfor- 
mance of the criterion task, the question of te: 'i validity becomes one of 
the representativeness of the test tasks. Th^s, if the student’s objective 
were to be capable of solving sets of simultaneous linear equations, then 
particular sets of equations would be needed for the test iverformance. The 
test would be considered valid only if the test equations fairly represented 
the universe of equations described by the objective. In addition, the 
test should be representative of the criterion with respect to important 
conditions of the task. In the example cited, the time allowed for solu- 
tions, the accuracy required in answers, the amount and portions of the 
solution which must be displayed, etc., are possible criterion conditions 
which should be fairly represented in the test. 

The question of the representativeness of test tasks has practical, 
methodological, and theoretical aspects. Thus, criteria can be imagined 
for which a fully representative test would be virtually endless because 
the examples of the criterion task, or the conditions under which the task 
would be performed, are extremely diverse. On the other hand, criteria 
can be written which are so specific as to include only one example or test 
task. As a practical matter, neither criterion serves well as an educa- 
tional objective or as a criterion for measurement. Each would be modified 
to encompass more appropriate amounts of learning and testing. Still, 
these extreme types of criteria raise the theoretical problem of defining 
representativeness and the methodological problem of devising methods for 
selecting a set of tasks demonstrably representative of the criterion. 

These same problems of representativeness or task identity appear in con- 
texts other than achievement testing, notably in the design of training 
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devices (Gagne, 195^) in the various applications of system simulation 
(e«g«, Davis & Behan, 1962), and in the analysis of jobs for core content 
(Altman & Gagne, 1965). No formal and generally applicable theory or 
method is available for assuring that test tasks are truly representative 
of the criterion, though Altman (1966) describes a psycho logical -process 
X content model with interesting possibilities* This does not reduce the 
Importance of representing the criterion faithfully in our tests of indi- 
vidual achievement, nor the need for deliberate practical attempts to 
assure that the achievement tests are relevant to the criteria, 

Rel iabi 1 itv . Every measurement errs to some degree in estimating the 
true value of the variable measured. Repeated sets of measures of the same 
Individuals never exactly duplicate one another. Every set of measurements 
thus is unreliable to some degree. The degree of unreliability in a mea- 
sure is of practical importance because it determines the confidence with 
which the measure may be used as a basis for decisions. If a measure is 
sufficiently unreliable, it is worthless as a basis for decision, no matter 
how relevant (valid) the test tasks may be with respect to the criterion. 

In an earlier section, it was pointed out that tests used in decisions 
about individuals should evaluate Individual performances with less error 
than would be tolerable were the same tests to be used only for curriculum 
and administrative decisions. But no precise statement has been or can be 
made at this time as to the minimum level of reliability which must be 
achieved by the measures in this project. Such statements can be developed 
from specifications of the size of test score differences which must be 
detected and of the risks of error which can be tolerated (e,g«, Kelley, 
1927), However, in a practical situation, such as an operating school, 
preset standards for discriminations and risks may be of little value. 
Actions must be taken on the best available basis, even if the risks are 
larger than desired. In the present project, it seems clear that highly 
reliable measures of individual achievement should be a goal for test 
development. Such measures would contribute substantially to the efficiency 
of the curriculum operation and to the fairness with which each student is 
Is dealt. The basic objective should be for students only rarely to repeat 
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or pass a learning unit as the result of testing error. But the goal of 
high reliability should not be achieved through significant sacrifice of 
relevance In the measures nor through Important restriction of the learning 
activities. Fortunately, the curriculum structure and the procedures 
planned In Individualizing Instruction can tolerate some unreliability In 
the achievement tests, Long^tenn difficulty for a student Is unlikely to 
result from an occasional error In evaluating his performance on Individual 
learning units which are relatively short. An erroneous "failure’’ decision 
can be overcome as soon as the student decides he Is ready for retest. An 
erroneous "pass" decision should result In detectable difficulty with the 
next higher learning unit and precipitate correction of the assignment error. 



_S_corl ng . Objectives for learning units, which serve as criteria. In- 
clude specifications for a satisfactory performance. The test of a student *s 
achievement of the objective Is required In this program to produce only a 
pass-fall score based on whether his performance meets or exceeds the 
specified standard. For purposes of validity and reliability In measure- 
ment, several example performances may be required of the student In test, 
so that considerable data should be available to support the pass-fall 
decision and other analyses. But the primary test score need be only 
dichotomous. The purpose of each test Is to compare a student‘s performance 
with an a priori standard, not to compare his performance with that of 
other students or with established norms. The measures required In this 
program are an example of "criterion-referenced measures" (Glaser & Klaus, 
1962) which Indicate the content of the student's behavioral repertory 
without reference to the performance of other persons. 

The pass-fall score requires that students be sorted Into only two 
groups. Were we to require a finer sorting (say. Into low fall, fall, 
pass, high pass), a larger number of sorting errors would be expected. If 
we required a sort Into M groups, where N Is the n ;n^:ar of students, the 
resulting rank ordering of students would be expecliid to Include yet a 
larger number of errors. The pass-fall score is expected, therefore, to 
produce the minimum error and the highest reliability of the scores which 
could be used. While this Is not the major reason for Its use, It Is a 
welcome result. 
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Summary , Plans for the structure of the curriculum for the instruc- 
tional procedures, and for the roles to be played by performance measures 
require that many diverse tests be devised which are: 

• adequate to support educational decisions about 
individual students. 

• reasonably standardized with respect to administration, 
scoring, and interpretation, 

• representative of the universe of tasks defined by the 
objectives for learning units. 

• as reliable as practical constraints permit without 
significant sacrifice of validity. 

• scored by reference to the criteria provided by the 
learning unit objectives. 

Development Procedures 

The discussion of tests so far has considered the educational arrange- 
ments within which tests will be used, the roles they will be expected to 
play, and the technical characteristics they consequently must display. 

This section considers briefly some major aspects of the procedures being 
employed in test development. 

It should be noted that only a few of the curriculum areas have reached 
the point of developing proficiency measures as of this reporting date. 
Relatively small amounts of test material could be displayed and our techni- 
cal and operating procedures still are in the shakedown process. However, 
the general outline of our modes of operation and our handling of central 
problems can be described. 

Organizational arrangements . The professional staff of the project 
includes l6 members of the faculty of the faculty of the Ouincy Public 
Schools and three research people from A.i.R. Each faculty member has 
responsibility for curriculum development in one area and is, by training 
and occupation, a specialist in his area. Faculty members provide the 
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project with subject-matter knowledge, technical skills, and teaching expe- 
rience. Each faculty person in a vocational area also has work experience 
in his area of responsibility. Many Quincy faculty members not assigned to 
the project are nonetheless available to the project for consultation, 
review of products and specific assignments for which they are especially 
qualified. Experienced A.i.R. people provide the project with skills and 
knowledge in methods of research and in educational and psychological 
measurement. Each task in the program is attacked as a cooperative effort 
of these two groups. 



In the development of proficiency measures, research members are respon- 
sible for analysis and definition of the technical requirements for the 
measures, for devising or selecting the procedures to be used in developing 
the measures, for preparing procedural and technique guides, and for pro- 
viding test writers with direction, assistance and technical review. In 
working with fai/ilty, research people are especially concerned with the 
behavioral aspects of the measures. That is, they attend to the problem of 
assuring that the psychological processes, response modes, and stimulus con- 
texts required in the criterion performance are represented appropriately 
•in the test task. Faculty specialists are responsible for developing the 
test items in accordance with requirements. In this work, they are espe- 
cially concerned with knowledge and skill content of the tests. 

Standardization, obiectivity, formats . The procedures and techniques 
employed in preparing the various kinds of test items are standard In test 
development practice (cf., Adkins, 1947; Lindquist, 1951). Project research 
members have prepared abbreviated guides (examples are shown in Appendices 
A and B) for use by the faculty specialists and have augmented these with 
instruction, consultation and review of finished items. Objective scoring 
is intended for all items, though some complex performances will be evaluated 
by use of checklists and, in rare cases, by rating methods. Appropriate 
test materials will be supplied to teachers with each set of learning unit 
materials and teachers will be instructed in their use so as to enhance 
standardization of testing procedures, scoring, and interpretation. 
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Representativeness . The most difficult development task is to assure 
that test tasks do in fact provide a representative demonstration of the 
capabilities defined as objectives for learning. As mentioned in an earlier 
section, no formal method or theory is available whereby test representa- 
tiveness is guaranteed. Consequently, we must depend basically upon the 
combined judgments of research and faculty people to produce valid measures. 
Though risk is involved in this logical process, the nature of the objec- 
tives and the systematic use of a partial frame of reference help to objectify 
the proc''.'^’ e. The problem can be described in the following two parts. 

1. assuring that the test task is an example of the criterion perfor- 
mance. 

The appropriate test task is quite apparent in many instances. For 
example, achievement of an objective which states, with appropriate addi- 
tional specification, that the student should be able to measure voltage 
using a given meter is assessed by requiring the student to do exactly that. 
Similarly, appropriate examples are written rather easily for such common 
objectives as solving equations, listing causes, punctuating, or reciting 
physical laws. 

In other instances, test tasks are less easily certified as examples 
of the criterion. Usual 1/, this is because the behavioral statement in the 
objective is not sufficiently specific. Consider a fictitious objective 
which requires that the student know the proper nomenclature for each part 
of machine X. Should the test require the student to write from memory a 
list of the parts? Mark the names of parts in a longer list? Say the 
correct name when the instructor points to the part? Write the names of. 
parts ino'cated in a picture of the machine? Several of these? Does it 
~^tter which is used? It is clear that these possible test situations in- 
volve different psychological processes, different response modes, and 
different stimulus presentations, though all are directed at the same 
“knowledge" (nomenclature of the parts). Unless the statement of criterion 
behavior provides the relevant information, there is no basis for choice 
among the possible item types or for asserting that any of the items is the 
task intended by the objective. 
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In practice, the appearance of such an item-objective pair would 
result in review and restatement of the objective so that the appropriate 
examples could be identified* But the essential requirement for assuring 
that test situations are examples of the criterion is a standard frame of 
reference defining the dimensions of important variation among performances* 
It is helpful to know that one should compare the test and criterion per- 
formances with respect to their response modes, psychological processes, 
and stimulus contexts, but what is a useful way to distinguish between, say, 
response modes? When are two modes functionally equivalent? When does ^ 
success in one mode imply capability in the other? In the absence of a 
comprehensive basis for such decision, we must rely upon persons experienced 
in the analysis of behavior to render judgments with respect to specific 
items* Considerable assistance is provided in part of this task by refer- 
ence to the hierarchical categories of psychological processes proposed by 
Altman (1966), who also has described the relations between these processes, 
the learning categories of Gagne (1965) and others (Melton, 1964), and 
classes of behavioral error. 

In passing it should be noted that one type of error frequently detected 
in item review (and not only in this project) is to specify a test item 
which requires a verbalization about the desired performance rather than the 
performance itself* Thus, instead of requiring the student to make a machine 
set-up as required by the objective, he might be asked to list the steps in 
that procedure. Frequently, especially in "academic" areas, verbalizations 
are perfectly appropriate objectives and are properly tested for by asking 
for verbal performance* In some cases, verbalization about or simulations 
of non-verbal performance are the only reasonable ways to estimate achieve- 
ment of an object* Proper behavior under certain emergency conditions 
would be an example of such instances* But there are many instances in 
which a verbal description of the performance is by no means equivalent to 
the performance itself and our tests are intended to exclude errors of this 
sort* 

2* Assuring that the test tasks adequately represent the universe of 
examples* 

Assuming that a properly stated objective defines a capability for a 
class of performances, the test should include demonstration of capability 
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across the important varieties of the performance. For example, if one 
data processing criterion capability were to perform all standard card 
sorting operations with a particular class of machines, we would want to 
be sure that the student could sort alphabetically as well as numerically, 
could produce the major types of card sorts, could handle large as well as 
small numbers of cards, etc* This is the problem of representing in a few 
test items all of the important specific performances of which the student 
should be capable if he has met the learning objective. 

!t is not necessary or desirable in all instances to include routinely 
every minor variation in tasks. Often, testing near the extremes of a 
dimension of variation or including all important dimensions in one task is 
more efficient and quite adequate. The important consideration is whether, 
having succeeded at the t.^st tasks, the student has demonstrated his capa- 
bility for performing all important instances of the criterion capability. 

We are dependent largely upon the knowledge of faculty specialists and 
their colleagues in identifying the important dimensions of variation among 
tasks. Item review by the research staff and discussion with the specialists 
provide a check on the adequacy of task sampling. The hazard is that some 
tasks may be omitted in favor of mere easily tested items or tasks which are 
more familiar to the individual preparing the test. The lack of formal 
methods and theory to support the comparison of human performances is again 
a handicap and is dealt with here In a manner analagous to the procedure 
described in the first part of this section on representativeness. 
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PLANS FOR NEXT QUARTER 



The following activities are planned for the quarter ending 30 Sep- 

Jkm • 

Lemuel I^VU« 

1. Development of performance measures will continue, becoming 
the primary activity in most curriculum areas. 

2. Selection and development of instructional materials, 
methods, aids, and procedures will continue in some areas 

concurrent with the development of measures. 

3. Selection, organization, and development of curriculum topics 
will continue in "academic*' areas in accordance with conclu- 
sions reached in faculty meetings with the Advisory Panel 
durir.g May. 

4. Materials, staff training, and implementation arrangements 
will be completed for the junior high guidance program and 
the program wi 1 1 be initiated. 

5* Development of plans for the senior high guidance program 
will continue. 



APPENDIX A 



PROFICIENCY TEST DEVELOPMENT’ 



1. The primary purpose of proficiency measurement is to provide an 
objective assessment of criterion behavior which, in Project ABLE, is 
attainment of a series of educational objectives. These goals have been 
stated as topic objectives relating back to a set of tasks (through course 
objectives) for specifically selected jobs. Since these topic objectives 
have been developed to detailed levels of behavior by asking for each per- 
formance in turn, *What kinds of previously learned capabilities need to 
be assumed if the person is to learn thi s capabi 1 i ty under a single set 

of learning conditions?", it should now be possible to reflect the criterion 
as stated (or implied) in each topic objective into one or more items that 
will measure a student’s success or failure in achieving the specified 
capability or educational goal. 

2. Within the general purpose of assessing criterion behavior, we 
have two associated reasons for measurement. The first is to assess present 
performance for initial placement into that point of the curriculum which 
the student is capable of completing satisfactorily without retracing or 
repeating previously acquired skills. The second is to determine performance 
adequacy of terminal behavior specified as the final training product; the 
proficiency test in this case may sample situations other than those ex-, 
plicitly covered in training so as to evaluate the extent to which specific ^ 
behaviors have been generalized to a variety of potential job situations, 

3. The ease with which the development of a proficiency measure can 

be carried out is dependent upon (1) the complexity of the behavior involved. 



^ Adapted from Adkins, Dorothy C,, Construction and analysis of achievement 
tests . Washington: U. S, Government Printing Office, 19^7. 



(2) the expl icitness with which the behavior has been defined, and (3) the 
accessibility of the behavior to observation.^ Because our effort from the 
start has been directed toward specific job^oriented capabilities, it may 
be feasible and desirable to develop proficiency tests directly from topic 
objectives prior to specification of curriculum content and selection of 
materials which in themselves will be oriented toward the same capabilities 
as the proficiency tests, 

4, To provide usable information, the proficiency measures must be 
objective and quantified. They may be in the form of test items (probably 
multiple choice recognition to simplify the quantification), checklists 
covering demonstrated performance, or rating scales (the least reliable or 
desirable). To meet our stated needs, the measures will be criterion 
referenced (an absolute standard of proficiency to be met by each individual 
student) rather than norm referenced (each student compared to other stu- 
dents). Thus the items will not be written at varying levels of difficulty. 
They must be directed, however, to the outcomes specified in the topic 
objectives} or at least to the desired capabilities if these are not 
clearly specified in the topic objectives. The proficiency measure will 
tell both the student and the teacher whether a capability has been achieved, 
and that knowledge needed to proceed has been acquired; it may also be used 
later in specifying curriculum content. The test items must be given 
thoughtful, careful writing to satisfy the requirements for their several 
later applications, 

5* A procedural outline that may help in preparing proficiency measures 
is provided below. 

a. Begin with the lowest skill level job and work successively 
one job at a time to the highest level. Since jobs were 
selected to build on the skills of prior levels, '4s will 
give an initial sequencing of capabilities, 

b. For each topic objective written to each job, note care- 
fully the critical behavior (capability to be established) 
and prepare two or more items by which student achievement 



of each capability can be evaluated. More than one 
item is required to increase accuracy of measurement 
(allow for possible “bad" items). 

Since topic objectives were completed in varying degrees 
of acceptability wsth respect to specificity and level 
of capability, it may be necessary to review content of 
each test item to see if a previously unidentified (no 
topic objective written) capability »s assumed, (in 
the attached sample, the first item was written to a 
specific topic objective on selecting the proper grinding 
wheel; on inspection another item was seen as necessary to 
establish a capability with reference to tensile strength 
of metals.) 

Write each item on an individual sheet, 5" x 8" in size, 
to facilitate later sequencing, editing, and typing. Be 
sure to complete all identifying data. Use the reverse 
side of the form for continuation of an item if necessary, 
but only one item to a sheet. 

Indicate by use of a check or star the correct alternative; 
use upper case letters to designate the alternatives. 

For multiple choice items, use four or five choices when- 
ever possible; make the distractors (incorrect responses) 
plausible and logical, but clearly incorrect (no trick 
questions). 

in the writing of items, should a desired capability come 
to mind for which no topic objective was previously pre- 
pared, develop the necessary test items, and in the space 
for T.O. number write the word "New." 

Proceed through the hierarchy of jpb skills and jobs in 
sequential order. 



I, If assessment of goal attainment is by means of a 
checklist or rating scale, the test item form sheet 
will be used to establish the objective quantified 
measurement scale by which criterion performance will 
be evaluated, 

6, Some principles that will help in the construction of items to 
measure proficiency, 

a. The item as a whole should be realistic and practical; 
it should call for knowledge the student must use, or 
present a problem he may have to solve on the job, 

b. The item should deal with an important and useful aspect 
of the job; leave out the trivia and useless information, 

c. The item should be phrased in the working language; do 
not copy it from a manual or other test, 

d. The item should be concerned with a capability required 
by the job, 

e. Each item should be independent of other items; it should 
not be possible to answer an item based only on the con- 
tent of some other item, 

f. The item should be specific and deal directly with the 
job, 

g. The central problem (item stem) should be clear and 
conci se, 

h. The problem should be stated accurately and precisely, 

i. The problem should include all of the information needed 
but should be stated briefly, 

j. The problem should contain only material relevant to its 
solution, 

k. The distractors should be important, plausible answers; 
they should present common errors and misconceptions 
rather than trivial, illogical alternatives. 



SAMPLE FORM 6 (Proficiency Test Item) 



VOC, AREA: 


Metals and Machines 


TASK 


NO.: 


9 


FAMI LY : 


Machines 


C.O. 


NO,: 


1 


JOB : 


Surface Grinder 


T.O. 


No.: 


2 



the 



To grind a material of low tensile strength, it would be best to use 
abrasive type wheel. 



A - Aluminum Carbide 
B - Aluminum Oxide 
C - Ferris Oxide 
* D - Silicon Carbide 



FORM 6 


(Proficiency Measures) 


Project ABLE 


VOC. AREA: Metals and Machines 


TASK NO.: 9 


FAMI LY 


: Machines 


C.O. NO.: 1 


JOB 


: Surface Grinder 


T.O. No.: New 



Identify the one set of metals that includes no metal of high tensile 
strength, 

A « Lead, aluminum, tungsten 
B - Aluminum, brass, steel 
* C - Brass, cast iron, magnesium 
D - Copper, molybdenum, bronze 



APPENDIX B 



DESCRIPTION OF TYPES OF aUESTIONs' 



In the writing of test items, knowledge may be measured in several 
ways. Some knowledge can be approached in only one way, but as job behaviors 
are sampled at successively higher levels, it may be not only appropriate, 
but even necessary to write several items around a single capability (topic 
objective), each asking a different kind of question about that single 
behavior. The material below is intended to be descriptive and illustrative 
of the kinds of tasks that can be set by multiple-choice items. 

Types of Items 

1, Definition: What means the same as ? 

Which of the following expresses the principle 
of ____ in different terms? 

2, Purpose: What purpose is served by ? 

What is the function of the ? 

Why is this operation performed ? 

What is the main reason for ? 

Which of the following is an example of ? 

3, Cause: What is the cause ? 

Under what condition is true? 

4, Effect: What is the effect of ? 

‘S done, what will happen? 



Adapted from Hosier, C, I,, Myers, M. C,, & Price, Helen G, Suggestions 
for the Construction of Multiple-Choice Test items. Educational fi. Psy- 
chological Measurement . Vol, 5, No, 3, Autumn 1945, as reported in Adkins, 
Dorothy C, Construction and analysis of achievement tests . Washington; 

U, S, Government Printing Office, ist?. 
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5, Association: What will occur at the time 



6. Recognition of Error; Which of the following represents 

an error in ? 



7, Identification of Error; What kind of error is 



What name is given to the 



error in 



What principle is violated 
when ? 



8. Evaluation; What is the best way to evaluate ? 

For what reason .is the best evaluation? 



9, Difference; What is the most important difference 

between ? 



What feature best differentiates 



10. Similarity; What single characteristic makes for similarity 

between 'i 



11, Arrangement; V/hat is the proper order to meet the required 

sequence? 

Which of the following comes first in operating 
the ? 

? 



What is the next step after 



12, Incomplete Arrangement; What step has been omitted from 

. Where should step placed? 



13, Common Principle; The following items except one are related 

by a common principle; 

What is the principle? 

Which does not belong? 

Which of the could be substituted 



to make all items related? 



14, Controversial Subjects; Although there is not complete agreement 

on , what is the primary 



reason given by those who do support 
't ts desi rabi 1 i ty? 
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